Skip to content

fix(datasets): increase create version request timeout#389

Merged
jared-paperspace merged 2 commits intomasterfrom
jlunde/pla-1127-dataset-upload-cli-command-not-working
Jun 22, 2022
Merged

fix(datasets): increase create version request timeout#389
jared-paperspace merged 2 commits intomasterfrom
jlunde/pla-1127-dataset-upload-cli-command-not-working

Conversation

@jared-paperspace
Copy link
Copy Markdown
Contributor

@jared-paperspace jared-paperspace commented Jun 22, 2022

After some digging, it turns out that larger files were timing out due to the default timeout of 5 seconds. Since the errors are happening inside of the worker pool, they are never reported to the user.

  • Increases the default timeout to 5 minutes per 15MB file
  • Better use of the requests.Session() context so that connection pooling is used and connections to a given host are maintained between requests. This should yield slightly better performance for parallel uploads.
  • Report progress to the user more often (every chunk). There is still much room for improvement here :)

Screenshots

Screen Shot 2022-06-21 at 8 28 52 PM

Screen.Recording.2022-06-21.at.8.25.35.PM.mov

@jared-paperspace jared-paperspace self-assigned this Jun 22, 2022
headers.update({'Content-Size': '0'})
r = session.put(url, data='', headers=headers, timeout=5)
# for files under 15MB
elif size <= (15e6):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to 15MB from 500MB

elif size <= (15e6):
with open(path, 'rb') as f:
r = session.put(
url, data=f, headers=headers, timeout=300)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase timeout from 5 seconds to 5 minutes

presigned_url,
data=chunk,
headers=headers,
timeout=300)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase timeout from 5 seconds to 5 minutes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this timeout can be pulled out too... 🤷

part_res = session.put(
presigned_url,
data=chunk,
headers=headers,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add headers

# console! Which again, jank and noisy, but arguably
# better than a task sitting forever, never either
# completing or emitting an error message.
print(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Report every chunk

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't too noisy now that we removed branch around it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think the previous one was noisy enough honestly. It feels like nothing is happening. In general, we just need a better progress bar.

content_type=result['mimetype'],
dataset_version_id=dataset_version_id,
key=result['key'])
with requests.Session() as session:
Copy link
Copy Markdown
Contributor Author

@jared-paperspace jared-paperspace Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use connection pooling from urllib3. Previously we weren't utilizing this feature, only the context.

# We can dynamically assign a larger part size if needed,
# but for the majority of use cases we should be fine
# as-is
part_minsize = int(15e6)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth moving this byte size to a 'constant'. i see it referenced above too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

Copy link
Copy Markdown

@brodeyn brodeyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good just a few nits

@jared-paperspace jared-paperspace merged commit c486182 into master Jun 22, 2022
@jared-paperspace jared-paperspace deleted the jlunde/pla-1127-dataset-upload-cli-command-not-working branch June 22, 2022 14:25
@PSBOT
Copy link
Copy Markdown

PSBOT commented Jun 22, 2022

🎉 This PR is included in version 2.0.5 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@PSBOT PSBOT added the released label Jun 22, 2022
Comment on lines +619 to +623
# we +2 the number of parts since we're doing floor
# division, which will cut off any trailing part
# less than the part_minsize, AND we want to 1-index
# our range to match what AWS expects for part
# numbers
Copy link
Copy Markdown

@ghost ghost Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? Shouldn't you use ceil? I have a gradient version create and gradient files put where it crashes when there is a file that evenly divides the 15mb chunk size (75mb). I suspect that it's trying to read an extra chunk that doesn't exist. The progress bar says 90mb/75mb when it crashes. Does that make sense? I could be misreading things.

Also I think you mean to say that you 1+index because you start counting from 1 and you need to correct for range. Not entirely sure. Could you check this out?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@ghost ghost Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was a known bug in a previous PR but intentionally left in. 🥇 The effect of reading past the end of the file was not predicted though.

#384 (comment)

@ghost ghost mentioned this pull request Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants